Goto

Collaborating Authors

 Toxicology


Explainable Molecular Property Prediction: Aligning Chemical Concepts with Predictions via Language Models

arXiv.org Artificial Intelligence

Providing explainable molecule property predictions is critical for many scientific domains, such as drug discovery and material science. Though transformer-based language models have shown great potential in accurate molecular property prediction, they neither provide chemically meaningful explanations nor faithfully reveal the molecular structure-property relationships. In this work, we develop a new framework for explainable molecular property prediction based on language models, dubbed as Lamole, which can provide chemical concepts-aligned explanations. We first leverage a designated molecular representation -- the Group SELFIES -- as it can provide chemically meaningful semantics. Because attention mechanisms in Transformers can inherently capture relationships within the input, we further incorporate the attention weights and gradients together to generate explanations for capturing the functional group interactions. We then carefully craft a marginal loss to explicitly optimize the explanations to be able to align with the chemists' annotations. We bridge the manifold hypothesis with the elaborated marginal loss to prove that the loss can align the explanations with the tangent space of the data manifold, leading to concept-aligned explanations. Experimental results over six mutagenicity datasets and one hepatotoxicity dataset demonstrate Lamole can achieve comparable classification accuracy and boost the explanation accuracy by up to 14.8%, being the state-of-the-art in explainable molecular property prediction.


Explainable machine learning for predicting shellfish toxicity in the Adriatic Sea using long-term monitoring data of HABs

arXiv.org Artificial Intelligence

In this study, explainable machine learning techniques are applied to predict the toxicity of mussels in the Gulf of Trieste (Adriatic Sea) caused by harmful algal blooms. By analysing a newly created 28-year dataset containing records of toxic phytoplankton in mussel farming areas and toxin concentrations in mussels (Mytilus galloprovincialis), we train and evaluate the performance of ML models to accurately predict diarrhetic shellfish poisoning (DSP) events. The random forest model provided the best prediction of positive toxicity results based on the F1 score. Explainability methods such as permutation importance and SHAP identified key species (Dinophysis fortii and D. caudata) and environmental factors (salinity, river discharge and precipitation) as the best predictors of DSP outbreaks. These findings are important for improving early warning systems and supporting sustainable aquaculture practices.


Hidden Flaws Behind Expert-Level Accuracy of GPT-4 Vision in Medicine

arXiv.org Artificial Intelligence

Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges - an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V outperforms human physicians regarding multi-choice accuracy (88.0% vs. 77.0%, p=0.034). GPT-4V also performs well in cases where physicians incorrectly answer, with over 80% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (27.3%), most prominent in image comprehension (21.6%). Regardless of GPT-4V's high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such models into clinical workflows.


Predictive toxicology evolving from in vivo to in vitro to in silico systems

#artificialintelligence

A team of researchers working at the Laboratory for Health Protection of the National Institute of Public Health and the Environment in Bilthoven, The Netherlands, in collaboration with the German Centre for the Protection of Laboratory Animals (Bf3R) from the German Federal Institute for Risk Assessment (BfR) in Berlin, Germany, and the Utrecht Institute of Pharmaceutical Sciences of the Utrecht University, Utrecht, The Netherlands, critically emphasize on the need for microphysiological systems to support the innovations in organoids & organ-on-chip microfluidic devices (Schneider et al., 2021). According to the investigators, the strict evaluation of the potentially toxic effects of certain chemicals, including pharmaceutical compounds, on human and environmental health continues to be tough. The complexity of biological processes and the lack of accessibility to in vivo experiments exacerbate this aspect. Therefore, during the past few years, an increasing number of researchers discovered recurring model systems ranging from single cell lines to complex animal models. During the past five years, microphysiological systems mimicking human physiology on a small scale gained great attention.


A Roadmap to Asymptotic Properties with Applications to COVID-19 Data

arXiv.org Artificial Intelligence

A good estimator should, at least in the asymptotic sense, be close to the true quantity that it wishes to estimate and we should be able to give uncertainty measure based on a finite sample size. An estimator with well-behaved asymptotic properties can help clinicians in many ways such as reducing the number of patients needed in a trial, cutting down the budget for toxicology studies and providing insightful findings for late phase trials. Suggested by Sr. Fisher [1], generations of statisticians have worked on the so-called "consistency" and "asymptotic normality" of estimators. The former is based on different versions of law of large numbers (LLN) and the later is based on various types of central limit theorems (CLT) [2]. In addition to these two main tools, statisticians also apply other important but less well-known results in probability theory and other mathematical fields. To name a few, extreme value theory for distributions of maxima and minima [3], convex analysis for checking the optimality of a statistical design [4], asymptotic relative efficiency (ARE) of an estimator [5], concentration inequalities for finite sample properties and selection consistency [6] and other non-normal limits, robustness and simultaneous confidence bands of common statistical estimators [7, 8]. Despite of different properties, consistency and asymptotic normality are still the most celebrated and important properties of statistical estimators in either academia or industry. Hence, in the following, we present a roadmap to consistency and asymptotic normality. Then we provide their applications in toxicology studies and clinical trials using a COVID-19 dataset.


Diagnosis of Acute Poisoning Using Explainable Artificial Intelligence

arXiv.org Artificial Intelligence

Medical toxicology is the clinical specialty that treats the toxic effects of substances, be it an overdose, a medication error, or a scorpion sting. The volume of toxicological knowledge and research has, as with other medical specialties, outstripped the ability of the individual clinician to entirely master and stay current with it. The application of machine learning techniques to medical toxicology is challenging because initial treatment decisions are often based on a few pieces of textual data and rely heavily on prior knowledge. ML techniques often do not represent knowledge in a way that is transparent for the physician, raising barriers to usability. Rule-based systems and decision tree learning are more transparent approaches, but often generalize poorly and require expert curation to implement and maintain. Here, we construct a probabilistic logic network to represent a portion of the knowledge base of a medical toxicologist. Our approach transparently mimics the knowledge representation and clinical decision-making of practicing clinicians. The software, dubbed Tak, performs comparably to humans on straightforward cases and intermediate difficulty cases, but is outperformed by humans on challenging clinical cases. Tak outperforms a decision tree classifier at all levels of difficulty. Probabilistic logic provides one form of explainable artificial intelligence that may be more acceptable for use in healthcare, if it can achieve acceptable levels of performance.


A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model

#artificialintelligence

A daunting challenge faced by environmental regulators in the U.S. and other countries is the requirement that they evaluate the potential toxicity of a large number of unique chemicals that are currently in common use (in the range of 10,000–30,000) but for which little toxicology information is available. The time and cost required for traditional toxicity testing approaches, coupled with the desire to reduce animal use is driving the search for new toxicity prediction methods [1–3]. Several efforts are starting to address this information gap by using relatively inexpensive, high throughput screening approaches in order to link chemical and biological space [1, 4–21]. The U.S. EPA is carrying out one such large screening and prioritization experiment, called ToxCast, whose goal is to develop predictive signatures or classifiers that can accurately predict whether a given chemical will or will not cause particular toxicities [4]. This program is investigating a variety of chemically-induced toxicity endpoints including developmental and reproductive toxicity, neurotoxicity and cancer. The initial training set being used comes from a collection of 300 pesticide active ingredients for which complete rodent toxicology profiles have been compiled. This set of chemicals will be tested in several hundred in vitro assays.